Discovering Demographic Language Variation
نویسندگان
چکیده
We propose a Bayesian generative model of how demographic social factors influence lexical choice. We apply the method to a corpus of geo-tagged Twitter messages originating from mobile phones, cross-referenced against U.S. Census demographic data. Our method discovers communities jointly defined by linguistic and demographic properties.
منابع مشابه
Exploring Language Variation Across Europe - A Web-based Tool for Computational Sociolinguistics
Language varies not only between countries, but also along regional and socio-demographic lines. This variation is one of the driving factors behind language change. However, investigating language variation is a complex undertaking: the more factors we want to consider, the more data we need. Traditional qualitative methods are not well-suited to do this, and therefore restricted to isolated f...
متن کاملDiscovering Stylistic Variations in Distributional Vector Space Models via Lexical Paraphrases
Detecting and analyzing stylistic variation in language is relevant to diverse Natural Language Processing applications. In this work, we investigate whether salient dimensions of style variations are embedded in standard distributional vector spaces of word meaning. We hypothesize that distances between embeddings of lexical paraphrases can help isolate style from meaning variations and help i...
متن کاملGenerative Typology
This article lays out an approach that combines a formal-generative perspective on language, including tolerance of abstract analyses, with a typological focus on comparing unrelated languages from around the world. It argues that this can be a powerful combination for discovering linguistic universals and patterns in linguistic variation that are not detected by other means.
متن کاملLate Talkers: D O Good Predictors Oy Outcome Exist?
Both small-scale and epidemiological longitudinal studies of early language delay indicate that most late talkers attain language scores in the average range by age 5, 5, or 7. However, late talker groups typically obtain significantly lower scores than groups with typical language histories on most language measures into adolescence. These findings support a dimensional account of language del...
متن کاملLanguage and Variation: A Study of English and Persian Wh-questions
It was claimed by variationists that languages experience variation at all levels, which is supposed to be patterned. The present study aimed at exploring how variation occurred in English and Persian wh-questions. More specifically, it investigated whether such a variation was systematic and patterned. To this end, a modified version of the Edinburgh Map Task was used in data collection. The p...
متن کامل